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An adaptive nonmonotone trust region 
method for unconstrained optimization 
problems based on a simple 
subproblem 


Z. Saeidian and M.R. Peyghami* 


Abstract 


Using a simple quadratic model in the trust region subproblem, a new 
adaptive nonmonotone trust region method is proposed for solving uncon- 
strained optimization problems. In our method, based on a slight modifica- 
tion of the proposed approach in (J. Optim. Theory Appl. 158(2):626-635, 
2013), a new scalar approximation of the Hessian at the current point is 
provided. Our new proposed method is equipped with a new adaptive rule 
for updating the radius and an appropriate nonmonotone technique. Under 
some suitable and standard assumptions, the local and global convergence 
properties of the new algorithm as well as its convergence rate are investi- 
gated. Finally, the practical performance of the new proposed algorithm is 
verified on some test problems and compared with some existing algorithms 
in the literature. 


Keywords: Trust region methods; Adaptive radius; Nonmonotone tech- 
nique; Scalar approximation of the Hessian; Global convergence. 


1 Introduction 


In this paper, we deal with the following unconstrained optimization problem: 


min. f(x) (1) 


where f : R” > R is a twice continuously differentiable function. Two popu- 
lar classes of optimization techniques for solving (1) are line search and trust 
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region methods; see, e.g., [9,17,18]. Line search methods refer to a procedure 
in which one moves along a (descent) direction as long as a sufficient reduc- 
tion in the objective is achieved. On the other hand, in the classical trust 
region methods, a trial step is computed by minimizing a (quadratic) model 
of the objective function at the current point over a region around this point. 
Then, using the so-called trust region ratio, the trial step is accepted/rejected 
and the new point as well as the radius is updated accordingly. It has been 
shown that trust region methods have appropriate global and local conver- 
gence properties. These methods have been widely studied in the literature; 
see, e.g., [9, 12,17, 19, 24, 25]. 

Here, let us briefly describe one step of the classical trust region method. 
Given xp, the trial step d, is computed by solving the following subproblem: 


1 
min q,(d) = gpd + 50 Bud s.t. ||dl| < Ax, (2) 


where gx, = Vf(rx), By is an x n symmetric matrix which is V?f(x,) or its 
approximation, A, > 0 is the so-called trust region radius, and ||.|| refers to 
the Euclidean norm. Due to the so-called trust region ratio 


f(x) — f (ze + de) 
gn (0) — an (dx) 


one decides whether the trial step is accepted or rejected; given pu € (0,1), 
if r, > ps, then the trial step is accepted and the new point is introduced by 
Le+1 = LE + dz. Otherwise, the trial step is rejected and the current point 
remains unchanged for the next iteration. In both cases, the trust region 
radius is updated appropriately. 

In the monotone trust region methods, the sequence of the objective val- 
ues is monotonically decreasing. This may cause slow convergence rate in 
some problems. In order to overcome this disadvantage, the concept of non- 
monotone strategies have been introduced in the framework of trust region 
methods, see, e.g., [13,14]. A nonmonotone line search method was first 
proposed by Chamberlain et al. in [8]. Grippo et al. in [13] introduced a 
nonmonotone technique for Newton’s method and developed it for uncon- 
strained optimization in [14]. Nevertheless many advantages of the Grippo’s 
technique, it suffers from some drawbacks [2, 3,27]. In order to overcome 
these difficulties, recently, Ahookhosh and Amini in [2] and Ahookhosh et al. 
in [3] proposed a new nonmonotone term as below: 


(3) 


Tkh= 


Re = ex feck) + (1 — €x) fe, (4) 


where fi = f(x), €k € [€min; €max] C [0,1] and fey) is the Grippo’s non- 
monotone term which is defined by 


feck) jeter? fe (5) 
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where M(0) = 0 and, for k > 1, M(k) = min{k, M}, for given positive inte- 
ger M. They employed (4) in the trust region ratio (3) and suggested non- 
monotone trust region methods which are globally convergent. The reported 
numerical results on test problems confirm the efficiency and robustness of 
these methods in practice too. 

The radius updating strategy is a crucial point in trust region methods 
[1,21, 28]. In the classical trust region methods, this parameter is simply 
enlarged, shrunk or stayed unchanged based on the magnitude of rz. Sev- 
eral strategies have been introduced in the literature for radius updating and 
initial radius choosing; see e.g. [11, 21-23, 29]. Zhang et al. in [29] proposed 
the radius update according to A, = c?||gx||||B,'||, where c € (0,1), p is 
a nonnegative integer and By = Betil isa positive definite matrix, for 
some i € N. Although, Zhang’s method uses more information of the ob- 
jective function for updating the radius, it requires an estimation of ||B, "||, 
which is costly. To reduce the computational cost of Zhang’s updating rule, 
a simple adaptive rule was proposed by Shi and Wang in [23] according to 


3 
A; = @_lisiel 
f 94 Bron 


nonnegative integer. Despite Zhang’s method that only updates the radius 
based on the current point information, some updating rules based on the 
information of the last two iterates have been introduced; see, e.g., [15,29,30]. 
Among them, Li [15] proposed an adaptive trust region method in which the 


dx 
Ho=t \Igel|, where yx—1 = Ge — Ge-1 


, where c € (0,1), B, isa positive definite matrix and p is a 


radius is updated according to Ay, = 
and dp—1 = XE LR-1.- 

The advantages of nonmonotone and adaptive techniques have been simulta- 
neously employed in the framework of trust region methods. Using the adap- 
tive strategy proposed in [15], Sang et al. in [20] introduced a nonmonotone 
adaptive trust region method based on a simple subproblem for large-scale 
unconstrained optimization problems which makes full use of information in 
the last two iterates. The idea of simple subproblem is originated from the 
fact that solving the subproblem (2) is costly especially when By, is a large- 
scale and dense matrix. Therefore, the skills of the quasi-Newton method is 
used for correcting B, by a real diagonal matrix AB,_, from By_1. Recently, 
Zhou et al. in [30] constructed a simple subproblem according to the modifi- 
cation of the secant condition of Wei in [26] and introduced a nonmonotone 
adaptive trust region method based on the simple subproblem. Later, Biglari 
and Solimanpur in [7] proposed another simple subproblem with some supe- 
rior properties to that of [30] in which the approximation of the Hessian at 
the current point x, is computed by 


A( fr—1 — fe) + 39% de—1 + gf_1dk-1 
d_ dea , 


(6) 


Ve = (te) = 


In this paper, we proposed a new nonmonotone adaptive trust region method 
based on simple subproblem for unconstrained optimization problems. Our 
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approach is equipped with the nonmonotone technique as proposed in [2,3], 
and uses a slight modification of the secant condition in [7] for constructing 
an approximation of the Hessian at the current point. Moreover, a modified 
version of the adaptive strategy in [20] is employed in the framework of the 
proposed algorithm. It is worth mentioning that the scalar approximation 
of the Hessian based on modified secant condition in [6] has superior to the 
standard Barzilai-Borwein method and its modifications. Under some stan- 
dard assumptions, the global convergence property, as well as its superlinear 
convergence rate, is established. Numerical results show the efficiency of the 
proposed approach in practice comparing with some existing methods in the 
literature. 

The rest of the paper is organized as follows: In Section 2, we present the 
structure of the new nonmonotone adaptive trust region method in details. 
The global convergence property, as well as its rate of convergence, is estab- 
lished in Section 3. Preliminary numerical results of applying the proposed 
algorithm on some test problems are given in Section 4. Finally, we end up 
the paper by some concluding remarks in Section 5. 


2 The new algorithm 


In this section, we propose a new adaptive nonmonotone trust region method 
for solving unconstrained optimization problems. Our algorithm combines 
the nonmonotone technique as proposed in [2] with an improved scalar ap- 
proximation of the Hessian according to the modified secant equation as 
proposed in [6]. 

Let us describe one step of our new algorithm here: For given xz, the trial step 
dy, is computed by (approximately) solving the following simple subproblem: 


; 1 
min gz (d) = ged+ 5 V(ax)d s.t. ||d|| < Ag, (7) 


where y, := 7(2,) is a scalar approximation of the Hessian matrix. Since 
4x, as defined by (6), may become negative in some iterations, we slightly 
modify (6) and define 7, as below: 


— A(fe-1 — fe) + (3+ )G_ Uk—1 + Gp—1Ak—1 
7 dg dk-1 


(8) 


Vk 


where 7, is computed by: 


Gk dk—1 ) if Ve < 0, 


A( fe —fre—1)-39% Uk —1- 9p 1 Uk —-1 +6 
hk = ‘ 
0, Otherwise, 


An adaptive nonmonotone trust region method for unconstrained ... 99 


where 6 is a small positive number. By this definition, it is obviously seen 
that 7, > 0. Now, using d;, the nonmonotone ratio is computed by: 


_ Re- flee + de) 


s Pred; 


(9) 
where R; is defined by (4) and Pred, = q(0) — a (dx). For given 
uw € (0,1), the trial step is accepted whenever rz, > jp; otherwise it is 
rejected. In both cases, the radius is adaptively updated according to 


A, = min {14 8, Amax where Ayax > 0 is a threshold value for the 
radii and 1%, is updated by: 


OoVk, Th < 1, 
Vet = § Ves fy <TR < pa; (10) 
min{o Vp, Umax} Tk > 2, 


where 0 < 09 <1 < 01,0 < fy < pe < 1 and Max > O are given numbers. 
By the way, the new point is given by 2,41 = t% + dx as long as rz > ps; 
otherwise, we set %p41 = Lx. 

The procedure of the new proposed nonmonotone trust region algorithm is 
outlined in Algorithm 1: 


Algorithm 1: A new nonmonotone adaptive trust region algorithm 


Input: rz € R",O0<w< wy < pe < 1,0 < 09 <1 < 04,0 < Guin < Emax < 
1, €,€, M, Umax; Amax > 0, 0 < 0, < 6) and é > 0. 


Step 0: Set k = 0, yo := ¥(%o) = 1, go = g(xo), M% = 1 and Ap = 
min {vp last, Aue hs 

Step 1: If ||g9,|| <¢, Then Stop. 

Step 2: Determine d; by solving (7) and compute r;, using (9). 

Step 3: If rz, < uw, Then set A, = ooAxg, and goto Step 2. 

Step 4: Set vp41 = 2p 4+ dk. 


Step 5: Compute 7,41 using (8). Tf ye41 < €, Then set y,41 = 1. If 
Vk+1 = i, Then set ¥p41 = 4. 


Step 6: Update vz41 using (10) and set Ayy1 = min {7p 41 Metall, Amax }. 
Set k =:k +1 and goto Step 1. 
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Remark 1. Step 5 of Algorithm 1 implies that y, is a bounded positive 
number for all &. More precisely, we have min{e, 6,} < y, < max { 1, 02}. 


Remark 2. The subproblem (7) can be easily solved by using the following 
procedure [20]: Let wz, = aa If ||wx|| < Ax, then we set the trial step as 


d;, = —wy. Otherwise, we choose a € (0,1) so that |law,|| = A,. It can be 
easily verified that a = 121: In this case, we set dy = —Qw, = Jou’ = 
— Ar 

on FR 


Remark 3. From Remark 2, one can easily see that, for all k, there ex- 
ists a positive constant « so that ||dx|| < «||gxl- 


3 Convergence analysis 


In this section, our aim is to analyze the local and global convergence prop- 
erties of Algorithm 1. For this purpose, the following assumption is imposed 
on the problem: 


Al. The set Q = {x € R"|f(x) < f(zo)} is a closed and bounded set and 
f(a) is a twice continuously differentiable function over 2. Moreover, 
V f(x) is a Lipschitz continuous function over 2. 


Lemma 1. Assume that d;, is a solution of the problem (7). Then, one has: 
= = - oA Ilgell 
Predy := (0) — de(de) 2 5 llgel| min ) Ax, ra i (11) 
Proof. We proceed the proof in the following two possible cases for dx: 


Case I. || — %*|| < Ag, and therefore, dy = re In this case one can easily 
obtain the following relations: 


a (0) — gr (dz) 


lI 

iQ 

> 

(on) 

~" 

| 

iQ 

> 
ae 
~ |S 
NY 


k 
T 
_ rf gk lf gk Ik 
Vk Vk Vk 
2 2 2 
k 1 lox ke 1 : ke 
_ lige? _ Agel? _ gel? Ay in Sa, Held. 
Vk 2 Vk 21k 2 Vk 


Case II. ||— £*|| > Ax, and therefore, dy = Sa ITALE In this case, we have: 
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an(9) — de (de) = 9n(0) — ae (- 4a) 


=o (ype) —3 (amar) (tea) 


1 1 
Ax|lgxel| — 5K AL > Axllg«l| — 5 Aallgell 


l 


1 1 . 
= Agllgell > Allgell min { Ay, 1) 
2 2 Yk 


where the first inequality is obtained from the fact that y,Arn < ||gz|l. 


Considering the above mentioned cases, the proof is completed. 


Lemma 2. Let dy be computed by the procedure as mentioned in Remark 2. 
Then, for all k, one has: 


If (an) — f (ee + dk) — Predy| < O((|dell*), (12) 


where Pred;, is defined by (11). 

Proof. Using Taylor’s expansion and the fact that 7, is bounded due to Re- 
mark 1, one can easily conclude the result. 
The following lemma states some appealing properties of the sequences 
{ fen) } and {R;,}, which are defined by (5) and (4), respectively. One can 
find its proof in [2]. 


Lemma 3. Suppose that Assumption Al holds and the sequence {xz} is 
generated by Algorithm 1. Then, the following statements hold: 


i) For all k, we have fpr < Rr < fern): 

ii) The sequence { fen) } ts a decreasing and convergent sequence. 
iii) limp oo fecny = limo fr: 

iv) limpoo Re = limg-soo fr. 


Lemma 4. Let Assumption Al hold and the sequence {x;,} be generated by 
Algorithm 1. Assume that there exists a constant ¢ € (0,1) so that ||gx|| > ¢, 
for allk. Then, for any k, there exists a nonnegative integer p so that tp4p+1 
is a successful iteration point, 1.€., Tktp+1 > P- 

Proof. Suppose that, on the contrary, there exists an iteration k so that, for 
all nonnegative integer p, the point x~4,41 is an unsuccessful iteration point, 
ie., 


rktp < Us, p=0,1,2,.... (13) 


In this case, from Step 3 of Algorithm 1, we have 
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Ak+p+1 < ob tt Ay. 
This inequality together with the definition of A; imply that: 
i = 0. 14 
ay Art+p+1 = 9 (14) 


Therefore, from Lemma 1, Remark 1 and (12), we have 


f (tetp) — f(Lktp + detp) 1 = | f (tetp) — f(Lktp + dktp) — Predg+p 
Predx+p Predurp 
O(Ide+pll) 
Ss 5llge-+pll min{A;+p, {eel} 
< O(|An+oll*) 


$¢ min {Aven satay } 


This implies that eee i > 0, as p > oo. Thus, for 


sufficiently large p, using Lemma 3, we have 


— Retp — f(fk+p + detp) 


es s Pern) = fern + dete) 


Predy+p Predy+p 


>], 


which contradicts rp4) < ps. This completes the proof of the lemma. 
Lemma 4 implies that the inner loop in Steps 2-3 of Algorithm 1 will be 
terminated after finite number of iterations, and therefore, Algorithm 1 is 
well-defined. 

The following theorem provides the global convergence property of Algo- 
rithm 1 under some suitable and standard assumptions. 


Theorem 1. Suppose that Assumption A1 holds and {x;,} is the sequence 
generated by Algorithm 1. Then, Algorithm 1 either stops at a stationary 
point or 


lim inf ||gz.|| = 0. (15) 
k—r00 


Proof. Suppose that Algorithm 1 does not stop at a stationary point. We 
show that (15) holds for the infinite sequence {x;,}. Assume that, on the 
contrary, there exists a positive constant ¢ so that 


gel] >¢ >0, Vk. (16) 


Using Lemma 4, Algorithm 1 is well-defined and the inner loop in Steps 2-3 
is terminated after finite number of iterations. Therefore, we may assume 
that r, > pu. Now, from (9) and Lemma 1, we have 
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1 . 
Ry - frp 2 wPred, 2 a Hllge ll min { Ap teal 


1 ¢ 
= gH¢ min {0 ao} = 0. (17) 


By taking limit from both sides of this inequality, as k — oo, and using 
Lemma 3, we conclude that 


Ar = y,l9ell > 0. (18) 
Yk 


Now, using Remark 1 and (16), (18) implies that 


Therefore, from (16) and Lemmas 1 and 2, we have 


f (ax) — fla, + dx) i]= f(a) — f(x, + dy) — Pred; 
Pred, Pred, 
2 
< (Wal?) _ 
$ilgel] min { Ax, Heel} 
A2 
< 0 (As) Pi: 
56 min {Ac ual \ 
which implies that 
Re — f(x +e) 5 f(t) — fee + de) 
— > . 
Yk Pred: = Pred, >1 (20) 


This shows that, for sufficiently large k, we have successful iterations. There- 
fore, there exists a positive constant v* so that, for sufficiently large k, 
Vy > v*. This contradicts (19). 
Under some extra assumptions on the problem and using the same proof line 
of Theorem 3.7 in [30], one can construct the superlinear convergence rate of 
the sequence {x,}, generated by Algorithm 1, to its limit point 2*. 


4 Numerical results 


In this section, we focus on providing some computational results of applying 
Algorithm 1, denoted by FATRA, along with the following algorithms on 
some test problems in order to compare their performances: 
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e NATRM: Algorithm 2.1 in [30]; 


e NATRA: Algorithm 2.1 in [80] in which the nonmonotone term in com- 
puting the trust region ratio rz, is replaced by Rx, as given by (4); 


e FATRM: Algorithm 1 in which the nonmonotone term in computing 
the trust region ratio r, is replaced by fx), as given by (5); 


All the algorithms are implemented in MATLAB 7.10.0 (R2010a) environ- 
ment on a PC with CPU 2.0 GHz and 4GB RAM memory and double pre- 
cision format. The following parameters are considered in the relevant algo- 
rithms: 
b= O.1, pa = 0.25, M2 = 0.75, €min sand ime €max = 10%, Amax aad 100, M= 10, 
m= = 05,0, S47 S4 ray = 0,1 = 0.25,e[e€=10 §5=16-*, 


Moreover, in Step 5 of Algorithm 1, if y,41 < ¢, then we set 0, = ¢; if 
Yk+1 > 4, then we set 02 = 4 The simple subproblem at each iteration 
is solved by the procedure as mentioned in Remark 2. All the algorithms 
are being stopped either ||gx|| < 10~°, or the number of iterations and/or 
function evaluations exceeds 50000. In the latter case, we declare that the 
algorithm is failed. The considered test problems are those in [30] as well as 
some large-scale problems taken from [16] and [4]. We have also utilized the 
advantages of the performance profile of Dolan and Moré in [10] to compare 
the performances of considered algorithms. 

Numerical results are given in Table 1. In this table, Prob stands for the 
problem name, and n;, nf and fo, denote the number of iterations, the 
number of function evaluations and the optimum value of the objective func- 
tion, respectively. It should be noted that the number of gradient evaluations 
are almost the same as nj. 

Figures 1 and 2 show the performance profiles of the results in Table 1 
based on the number of iterations and function evaluations, respectively. At a 
glance to Figure 1, we can find out that, in terms of n;, FATRA solves all the 
considered test problems successfully, while the other algorithms have at least 
one failure in their runs. Moreover, FATRA and FATRM algorithms solve 
roughly 67% and 61% of the problems at the lowest value of n;, respectively. 
This percentage for NATRM and NATRA algorithms are 49% and 47%, 
respectively. Figure 2 is drawn based on ny of the results in Table 1. From 
this figure, it is revealed that FATRA solves all the problems successfully 
while FATRM has one failure in its run. Moreover, NATRM and NATRA 
algorithms solve roughly 96% and 98% of the test problems successfully. On 
the other hand, FATRA and FATRM algorithms solve about 58% and 60% of 
test problems in the lowest value of ns while these percentages for NATRM 
and NATRA algorithms are about 34% and 22%. 

Besides the performance profiles of the considered algorithms based on n,; 
and ny, we have stored the average CPU time in 20 runs for each algorithms 
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and drew the performance profile of the considered algorithms based on CPU 
time in Figure 3. The result shows that FATRA works well in this regard 
too. Based on the above mentioned arguments, one can easily realize that 
FATRA is competitive with FATRM, NATRM and NATRA algorithms in 
terms of n;, ng and CPU time. Moreover, the performance of FATRM is 
very close to FATRA. 
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Figure 3: Performance profile of considered algorithms based on CPU time 


5 Conclusion 


In this paper, a new nonmonotone adaptive trust region method for solv- 
ing unconstrained optimization problems based on a simple subproblem is 
presented. The new proposed algorithm uses the advantage of the adaptive 
trust region method, as proposed in [5], with the nonmonotone term, as sug- 
gested in [2]. The global convergence property of the new proposed method 
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is established under some standard assumptions. Numerical results on some 
large-scale test problems confirm the efficiency and effectiveness of the new 
proposed algorithm in comparison with some other existing algorithms in the 
literature. 
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